🐁Mice Trisomy Data Analysis🔎

🧬(Prediction at the end)🔮

Titanic

Context

Expression levels of 77 proteins measured in the cerebral cortex of 8 classes of control and Down syndrome mice exposed to context fear conditioning, a task used to assess associative learning.

Content

The data set consists of the expression levels of 77 proteins/protein modifications that produced detectable signals in the nuclear fraction of cortex. There are 38 control mice and 34 trisomic mice (Down syndrome), for a total of 72 mice. In the experiments, 15 measurements were registered of each protein per sample/mouse. Therefore, for control mice, there are 38x15, or 570 measurements, and for trisomic mice, there are 34x15, or 510 measurements. The dataset contains a total of 1080 measurements per protein. Each measurement can be considered as an independent sample/mouse.

The eight classes of mice are described based on features such as genotype, behavior and treatment. According to genotype, mice can be control or trisomic. According to behavior, some mice have been stimulated to learn (context-shock) and others have not (shock-context) and in order to assess the effect of the drug memantine in recovering the ability to learn in trisomic mice, some mice have been injected with the drug and others have not.

Exploratory Data Analysis¶

Aim :¶

  • Understand the data ("A small step forward is better than a big one backwards")
  • Begin to develop a modelling strategy

Target¶

Classes:

c-CS-s: control mice, stimulated to learn, injected with saline (9 mice)

c-CS-m: control mice, stimulated to learn, injected with memantine (10 mice)

c-SC-s: control mice, not stimulated to learn, injected with saline (9 mice)

c-SC-m: control mice, not stimulated to learn, injected with memantine (10 mice)

t-CS-s: trisomy mice, stimulated to learn, injected with saline (7 mice)

t-CS-m: trisomy mice, stimulated to learn, injected with memantine (9 mice)

t-SC-s: trisomy mice, not stimulated to learn, injected with saline (9 mice)

t-SC-m: trisomy mice, not stimulated to learn, injected with memantine (9 mice)

Features¶

[1] Mouse ID

[2:78] Values of expression levels of 77 proteins; the names of proteins are followed by N indicating that they were measured in the nuclear fraction. For example: DYRK1A_n

[79] Genotype: control (c) or trisomy (t)

[80] Treatment type: memantine (m) or saline (s)

[81] Behavior: context-shock (CS) or shock-context (SC)

[82] Class: c-CS-s, c-CS-m, c-SC-s, c-SC-m, t-CS-s, t-CS-m, t-SC-s, t-SC-m

Base Checklist¶

Shape Analysis :¶

  • target feature : Class
  • rows and columns : 1080 , 82
  • features types : qualitatives : 5 , quantitatives : 77
  • NaN analysis :
    • NaN (5 features > 15 % of NaN (all others < 5%))

Columns Analysis :¶

  • Target Analysis :
    • Balanced (Yes/No) : Yes
    • Percentages : ~12.5% for each class
  • Categorical values
    • There is 4 categorical features (not inluding the target)
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
In [2]:
data = pd.read_csv('../input/mice-protein-expression/Data_Cortex_Nuclear.csv')
df = data.copy()
pd.set_option('display.max_row',df.shape[0])
pd.set_option('display.max_column',df.shape[1]) 
df.head()
Out[2]:
MouseID DYRK1A_N ITSN1_N BDNF_N NR1_N NR2A_N pAKT_N pBRAF_N pCAMKII_N pCREB_N pELK_N pERK_N pJNK_N PKCA_N pMEK_N pNR1_N pNR2A_N pNR2B_N pPKCAB_N pRSK_N AKT_N BRAF_N CAMKII_N CREB_N ELK_N ERK_N GSK3B_N JNK_N MEK_N TRKA_N RSK_N APP_N Bcatenin_N SOD1_N MTOR_N P38_N pMTOR_N DSCR1_N AMPKA_N NR2B_N pNUMB_N RAPTOR_N TIAM1_N pP70S6_N NUMB_N P70S6_N pGSK3B_N pPKCG_N CDK5_N S6_N ADARB1_N AcetylH3K9_N RRP1_N BAX_N ARC_N ERBB4_N nNOS_N Tau_N GFAP_N GluR3_N GluR4_N IL1B_N P3525_N pCASP9_N PSD95_N SNCA_N Ubiquitin_N pGSK3B_Tyr216_N SHH_N BAD_N BCL2_N pS6_N pCFOS_N SYP_N H3AcK18_N EGR1_N H3MeK4_N CaNA_N Genotype Treatment Behavior class
0 309_1 0.503644 0.747193 0.430175 2.816329 5.990152 0.218830 0.177565 2.373744 0.232224 1.750936 0.687906 0.306382 0.402698 0.296927 1.022060 0.605673 1.877684 2.308745 0.441599 0.859366 0.416289 0.369608 0.178944 1.866358 3.685247 1.537227 0.264526 0.319677 0.813866 0.165846 0.453910 3.037621 0.369510 0.458539 0.335336 0.825192 0.576916 0.448099 0.586271 0.394721 0.339571 0.482864 0.294170 0.182150 0.842725 0.192608 1.443091 0.294700 0.354605 1.339070 0.170119 0.159102 0.188852 0.106305 0.144989 0.176668 0.125190 0.115291 0.228043 0.142756 0.430957 0.247538 1.603310 2.014875 0.108234 1.044979 0.831557 0.188852 0.122652 NaN 0.106305 0.108336 0.427099 0.114783 0.131790 0.128186 1.675652 Control Memantine C/S c-CS-m
1 309_2 0.514617 0.689064 0.411770 2.789514 5.685038 0.211636 0.172817 2.292150 0.226972 1.596377 0.695006 0.299051 0.385987 0.281319 0.956676 0.587559 1.725774 2.043037 0.445222 0.834659 0.400364 0.356178 0.173680 1.761047 3.485287 1.509249 0.255727 0.304419 0.780504 0.157194 0.430940 2.921882 0.342279 0.423560 0.324835 0.761718 0.545097 0.420876 0.545097 0.368255 0.321959 0.454519 0.276431 0.182086 0.847615 0.194815 1.439460 0.294060 0.354548 1.306323 0.171427 0.158129 0.184570 0.106592 0.150471 0.178309 0.134275 0.118235 0.238073 0.142037 0.457156 0.257632 1.671738 2.004605 0.109749 1.009883 0.849270 0.200404 0.116682 NaN 0.106592 0.104315 0.441581 0.111974 0.135103 0.131119 1.743610 Control Memantine C/S c-CS-m
2 309_3 0.509183 0.730247 0.418309 2.687201 5.622059 0.209011 0.175722 2.283337 0.230247 1.561316 0.677348 0.291276 0.381002 0.281710 1.003635 0.602449 1.731873 2.017984 0.467668 0.814329 0.399847 0.368089 0.173905 1.765544 3.571456 1.501244 0.259614 0.311747 0.785154 0.160895 0.423187 2.944136 0.343696 0.425005 0.324852 0.757031 0.543620 0.404630 0.552994 0.363880 0.313086 0.447197 0.256648 0.184388 0.856166 0.200737 1.524364 0.301881 0.386087 1.279600 0.185456 0.148696 0.190532 0.108303 0.145330 0.176213 0.132560 0.117760 0.244817 0.142445 0.510472 0.255343 1.663550 2.016831 0.108196 0.996848 0.846709 0.193685 0.118508 NaN 0.108303 0.106219 0.435777 0.111883 0.133362 0.127431 1.926427 Control Memantine C/S c-CS-m
3 309_4 0.442107 0.617076 0.358626 2.466947 4.979503 0.222886 0.176463 2.152301 0.207004 1.595086 0.583277 0.296729 0.377087 0.313832 0.875390 0.520293 1.566852 2.132754 0.477671 0.727705 0.385639 0.362970 0.179449 1.286277 2.970137 1.419710 0.259536 0.279218 0.734492 0.162210 0.410615 2.500204 0.344509 0.429211 0.330121 0.746980 0.546763 0.386860 0.547849 0.366771 0.328492 0.442650 0.398534 0.161768 0.760234 0.184169 1.612382 0.296382 0.290680 1.198765 0.159799 0.166112 0.185323 0.103184 0.140656 0.163804 0.123210 0.117439 0.234947 0.145068 0.430996 0.251103 1.484624 1.957233 0.119883 0.990225 0.833277 0.192112 0.132781 NaN 0.103184 0.111262 0.391691 0.130405 0.147444 0.146901 1.700563 Control Memantine C/S c-CS-m
4 309_5 0.434940 0.617430 0.358802 2.365785 4.718679 0.213106 0.173627 2.134014 0.192158 1.504230 0.550960 0.286961 0.363502 0.277964 0.864912 0.507990 1.480059 2.013697 0.483416 0.687794 0.367531 0.355311 0.174836 1.324695 2.896334 1.359876 0.250705 0.273667 0.702699 0.154827 0.398550 2.456560 0.329126 0.408755 0.313415 0.691956 0.536860 0.360816 0.512824 0.351551 0.312206 0.419095 0.393447 0.160200 0.768113 0.185718 1.645807 0.296829 0.309345 1.206995 0.164650 0.160687 0.188221 0.104784 0.141983 0.167710 0.136838 0.116048 0.255528 0.140871 0.481227 0.251773 1.534835 2.009109 0.119524 0.997775 0.878668 0.205604 0.129954 NaN 0.104784 0.110694 0.434154 0.118481 0.140314 0.148380 1.839730 Control Memantine C/S c-CS-m
In [3]:
df.dtypes.value_counts() # Compte les nombre de types de variables
Out[3]:
float64    77
object      5
dtype: int64
In [4]:
print('There is' , df.shape[0] , 'rows')
print('There is' , df.shape[1] , 'columns')
There is 1080 rows
There is 82 columns
In [5]:
plt.figure(figsize=(10,10))
sns.heatmap(df.isna(),cbar=False)
plt.show()
No description has been provided for this image
In [6]:
(df.isna().sum()/df.shape[0]*100).sort_values(ascending=False)
Out[6]:
BCL2_N             26.388889
H3MeK4_N           25.000000
BAD_N              19.722222
EGR1_N             19.444444
H3AcK18_N          16.666667
pCFOS_N             6.944444
ELK_N               1.666667
Bcatenin_N          1.666667
MEK_N               0.648148
P38_N               0.277778
JNK_N               0.277778
TRKA_N              0.277778
RSK_N               0.277778
SOD1_N              0.277778
MTOR_N              0.277778
RAPTOR_N            0.277778
pMTOR_N             0.277778
DSCR1_N             0.277778
AMPKA_N             0.277778
GSK3B_N             0.277778
pNUMB_N             0.277778
DYRK1A_N            0.277778
TIAM1_N             0.277778
pP70S6_N            0.277778
NR2B_N              0.277778
APP_N               0.277778
ERK_N               0.277778
PKCA_N              0.277778
NR1_N               0.277778
NR2A_N              0.277778
pAKT_N              0.277778
pBRAF_N             0.277778
CREB_N              0.277778
pCAMKII_N           0.277778
pCREB_N             0.277778
pELK_N              0.277778
BDNF_N              0.277778
pJNK_N              0.277778
pERK_N              0.277778
pMEK_N              0.277778
pNR1_N              0.277778
pNR2A_N             0.277778
pNR2B_N             0.277778
pPKCAB_N            0.277778
pRSK_N              0.277778
AKT_N               0.277778
BRAF_N              0.277778
CAMKII_N            0.277778
ITSN1_N             0.277778
SNCA_N              0.000000
Ubiquitin_N         0.000000
pGSK3B_Tyr216_N     0.000000
PSD95_N             0.000000
SHH_N               0.000000
Behavior            0.000000
pS6_N               0.000000
SYP_N               0.000000
CaNA_N              0.000000
Genotype            0.000000
Treatment           0.000000
P3525_N             0.000000
pCASP9_N            0.000000
MouseID             0.000000
IL1B_N              0.000000
AcetylH3K9_N        0.000000
NUMB_N              0.000000
P70S6_N             0.000000
pGSK3B_N            0.000000
pPKCG_N             0.000000
CDK5_N              0.000000
S6_N                0.000000
ADARB1_N            0.000000
RRP1_N              0.000000
GluR4_N             0.000000
BAX_N               0.000000
ARC_N               0.000000
ERBB4_N             0.000000
nNOS_N              0.000000
Tau_N               0.000000
GFAP_N              0.000000
GluR3_N             0.000000
class               0.000000
dtype: float64
In [7]:
exploitable = df.columns[df.isna().sum()/df.shape[0]< 0.70 ] #Colonnes du dataframe où le pourcentage de NaN inférieur à XXXXXXX %
df = df[exploitable]
df.head()
Out[7]:
MouseID DYRK1A_N ITSN1_N BDNF_N NR1_N NR2A_N pAKT_N pBRAF_N pCAMKII_N pCREB_N pELK_N pERK_N pJNK_N PKCA_N pMEK_N pNR1_N pNR2A_N pNR2B_N pPKCAB_N pRSK_N AKT_N BRAF_N CAMKII_N CREB_N ELK_N ERK_N GSK3B_N JNK_N MEK_N TRKA_N RSK_N APP_N Bcatenin_N SOD1_N MTOR_N P38_N pMTOR_N DSCR1_N AMPKA_N NR2B_N pNUMB_N RAPTOR_N TIAM1_N pP70S6_N NUMB_N P70S6_N pGSK3B_N pPKCG_N CDK5_N S6_N ADARB1_N AcetylH3K9_N RRP1_N BAX_N ARC_N ERBB4_N nNOS_N Tau_N GFAP_N GluR3_N GluR4_N IL1B_N P3525_N pCASP9_N PSD95_N SNCA_N Ubiquitin_N pGSK3B_Tyr216_N SHH_N BAD_N BCL2_N pS6_N pCFOS_N SYP_N H3AcK18_N EGR1_N H3MeK4_N CaNA_N Genotype Treatment Behavior class
0 309_1 0.503644 0.747193 0.430175 2.816329 5.990152 0.218830 0.177565 2.373744 0.232224 1.750936 0.687906 0.306382 0.402698 0.296927 1.022060 0.605673 1.877684 2.308745 0.441599 0.859366 0.416289 0.369608 0.178944 1.866358 3.685247 1.537227 0.264526 0.319677 0.813866 0.165846 0.453910 3.037621 0.369510 0.458539 0.335336 0.825192 0.576916 0.448099 0.586271 0.394721 0.339571 0.482864 0.294170 0.182150 0.842725 0.192608 1.443091 0.294700 0.354605 1.339070 0.170119 0.159102 0.188852 0.106305 0.144989 0.176668 0.125190 0.115291 0.228043 0.142756 0.430957 0.247538 1.603310 2.014875 0.108234 1.044979 0.831557 0.188852 0.122652 NaN 0.106305 0.108336 0.427099 0.114783 0.131790 0.128186 1.675652 Control Memantine C/S c-CS-m
1 309_2 0.514617 0.689064 0.411770 2.789514 5.685038 0.211636 0.172817 2.292150 0.226972 1.596377 0.695006 0.299051 0.385987 0.281319 0.956676 0.587559 1.725774 2.043037 0.445222 0.834659 0.400364 0.356178 0.173680 1.761047 3.485287 1.509249 0.255727 0.304419 0.780504 0.157194 0.430940 2.921882 0.342279 0.423560 0.324835 0.761718 0.545097 0.420876 0.545097 0.368255 0.321959 0.454519 0.276431 0.182086 0.847615 0.194815 1.439460 0.294060 0.354548 1.306323 0.171427 0.158129 0.184570 0.106592 0.150471 0.178309 0.134275 0.118235 0.238073 0.142037 0.457156 0.257632 1.671738 2.004605 0.109749 1.009883 0.849270 0.200404 0.116682 NaN 0.106592 0.104315 0.441581 0.111974 0.135103 0.131119 1.743610 Control Memantine C/S c-CS-m
2 309_3 0.509183 0.730247 0.418309 2.687201 5.622059 0.209011 0.175722 2.283337 0.230247 1.561316 0.677348 0.291276 0.381002 0.281710 1.003635 0.602449 1.731873 2.017984 0.467668 0.814329 0.399847 0.368089 0.173905 1.765544 3.571456 1.501244 0.259614 0.311747 0.785154 0.160895 0.423187 2.944136 0.343696 0.425005 0.324852 0.757031 0.543620 0.404630 0.552994 0.363880 0.313086 0.447197 0.256648 0.184388 0.856166 0.200737 1.524364 0.301881 0.386087 1.279600 0.185456 0.148696 0.190532 0.108303 0.145330 0.176213 0.132560 0.117760 0.244817 0.142445 0.510472 0.255343 1.663550 2.016831 0.108196 0.996848 0.846709 0.193685 0.118508 NaN 0.108303 0.106219 0.435777 0.111883 0.133362 0.127431 1.926427 Control Memantine C/S c-CS-m
3 309_4 0.442107 0.617076 0.358626 2.466947 4.979503 0.222886 0.176463 2.152301 0.207004 1.595086 0.583277 0.296729 0.377087 0.313832 0.875390 0.520293 1.566852 2.132754 0.477671 0.727705 0.385639 0.362970 0.179449 1.286277 2.970137 1.419710 0.259536 0.279218 0.734492 0.162210 0.410615 2.500204 0.344509 0.429211 0.330121 0.746980 0.546763 0.386860 0.547849 0.366771 0.328492 0.442650 0.398534 0.161768 0.760234 0.184169 1.612382 0.296382 0.290680 1.198765 0.159799 0.166112 0.185323 0.103184 0.140656 0.163804 0.123210 0.117439 0.234947 0.145068 0.430996 0.251103 1.484624 1.957233 0.119883 0.990225 0.833277 0.192112 0.132781 NaN 0.103184 0.111262 0.391691 0.130405 0.147444 0.146901 1.700563 Control Memantine C/S c-CS-m
4 309_5 0.434940 0.617430 0.358802 2.365785 4.718679 0.213106 0.173627 2.134014 0.192158 1.504230 0.550960 0.286961 0.363502 0.277964 0.864912 0.507990 1.480059 2.013697 0.483416 0.687794 0.367531 0.355311 0.174836 1.324695 2.896334 1.359876 0.250705 0.273667 0.702699 0.154827 0.398550 2.456560 0.329126 0.408755 0.313415 0.691956 0.536860 0.360816 0.512824 0.351551 0.312206 0.419095 0.393447 0.160200 0.768113 0.185718 1.645807 0.296829 0.309345 1.206995 0.164650 0.160687 0.188221 0.104784 0.141983 0.167710 0.136838 0.116048 0.255528 0.140871 0.481227 0.251773 1.534835 2.009109 0.119524 0.997775 0.878668 0.205604 0.129954 NaN 0.104784 0.110694 0.434154 0.118481 0.140314 0.148380 1.839730 Control Memantine C/S c-CS-m

Examining target and features¶

In [8]:
df['class'].value_counts(normalize=True) #Classes déséquilibrées
Out[8]:
c-CS-m    0.138889
c-SC-m    0.138889
c-SC-s    0.125000
t-SC-s    0.125000
t-SC-m    0.125000
t-CS-m    0.125000
c-CS-s    0.125000
t-CS-s    0.097222
Name: class, dtype: float64
In [ ]:
for col in df.select_dtypes(include=['float64','int64']):
    plt.figure()
    sns.displot(df[col],kind='kde',height=3)
    plt.show()
In [10]:
for col in df.select_dtypes("object"):
    plt.figure()
    df[col].value_counts().plot.pie()
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

A bit of data engineering ...¶

In [11]:
for col in df.select_dtypes("object"):
    print(f'{col :-<50} {df[col].unique()}')
MouseID------------------------------------------- ['309_1' '309_2' '309_3' ... 'J3295_13' 'J3295_14' 'J3295_15']
Genotype------------------------------------------ ['Control' 'Ts65Dn']
Treatment----------------------------------------- ['Memantine' 'Saline']
Behavior------------------------------------------ ['C/S' 'S/C']
class--------------------------------------------- ['c-CS-m' 'c-SC-m' 'c-CS-s' 'c-SC-s' 't-CS-m' 't-SC-m' 't-CS-s' 't-SC-s']
In [12]:
def encoding(df):
    code = {'Control':1,
            'Ts65Dn':0,
            'Memantine':1,
            'Saline':0,
            'C/S':0,
            'S/C':1,
            'c-CS-m':0,
            'c-SC-m':1,
            'c-CS-s':2,
            'c-SC-s':3,
            't-CS-m':4,
            't-SC-m':5,
            't-CS-s':6,
            't-SC-s':7,
           }
    for col in df.select_dtypes('object'):
        df.loc[:,col]=df[col].map(code)
        
    return df

def imputation(df):
    
    #df = df.dropna(axis=0)
    df = df.fillna(df.mean())
    
    return df

def feature_engineering(df):
    useless_columns = ['MouseID']
    for feature in useless_columns:
        if feature in df:
            df = df.drop(feature,axis=1)
    return df
In [13]:
def preprocessing(df):
    df = encoding(df)
    df = feature_engineering(df)
    df = imputation(df)
    
    X = df.drop('class',axis=1)
    y = df['class'].astype(int)
      
    return df,X,y
In [14]:
df=data.copy()
df,X,y = preprocessing(df)
df.head()
Out[14]:
DYRK1A_N ITSN1_N BDNF_N NR1_N NR2A_N pAKT_N pBRAF_N pCAMKII_N pCREB_N pELK_N pERK_N pJNK_N PKCA_N pMEK_N pNR1_N pNR2A_N pNR2B_N pPKCAB_N pRSK_N AKT_N BRAF_N CAMKII_N CREB_N ELK_N ERK_N GSK3B_N JNK_N MEK_N TRKA_N RSK_N APP_N Bcatenin_N SOD1_N MTOR_N P38_N pMTOR_N DSCR1_N AMPKA_N NR2B_N pNUMB_N RAPTOR_N TIAM1_N pP70S6_N NUMB_N P70S6_N pGSK3B_N pPKCG_N CDK5_N S6_N ADARB1_N AcetylH3K9_N RRP1_N BAX_N ARC_N ERBB4_N nNOS_N Tau_N GFAP_N GluR3_N GluR4_N IL1B_N P3525_N pCASP9_N PSD95_N SNCA_N Ubiquitin_N pGSK3B_Tyr216_N SHH_N BAD_N BCL2_N pS6_N pCFOS_N SYP_N H3AcK18_N EGR1_N H3MeK4_N CaNA_N Genotype Treatment Behavior class
0 0.503644 0.747193 0.430175 2.816329 5.990152 0.218830 0.177565 2.373744 0.232224 1.750936 0.687906 0.306382 0.402698 0.296927 1.022060 0.605673 1.877684 2.308745 0.441599 0.859366 0.416289 0.369608 0.178944 1.866358 3.685247 1.537227 0.264526 0.319677 0.813866 0.165846 0.453910 3.037621 0.369510 0.458539 0.335336 0.825192 0.576916 0.448099 0.586271 0.394721 0.339571 0.482864 0.294170 0.182150 0.842725 0.192608 1.443091 0.294700 0.354605 1.339070 0.170119 0.159102 0.188852 0.106305 0.144989 0.176668 0.125190 0.115291 0.228043 0.142756 0.430957 0.247538 1.603310 2.014875 0.108234 1.044979 0.831557 0.188852 0.122652 0.134762 0.106305 0.108336 0.427099 0.114783 0.131790 0.128186 1.675652 1 1 0 0
1 0.514617 0.689064 0.411770 2.789514 5.685038 0.211636 0.172817 2.292150 0.226972 1.596377 0.695006 0.299051 0.385987 0.281319 0.956676 0.587559 1.725774 2.043037 0.445222 0.834659 0.400364 0.356178 0.173680 1.761047 3.485287 1.509249 0.255727 0.304419 0.780504 0.157194 0.430940 2.921882 0.342279 0.423560 0.324835 0.761718 0.545097 0.420876 0.545097 0.368255 0.321959 0.454519 0.276431 0.182086 0.847615 0.194815 1.439460 0.294060 0.354548 1.306323 0.171427 0.158129 0.184570 0.106592 0.150471 0.178309 0.134275 0.118235 0.238073 0.142037 0.457156 0.257632 1.671738 2.004605 0.109749 1.009883 0.849270 0.200404 0.116682 0.134762 0.106592 0.104315 0.441581 0.111974 0.135103 0.131119 1.743610 1 1 0 0
2 0.509183 0.730247 0.418309 2.687201 5.622059 0.209011 0.175722 2.283337 0.230247 1.561316 0.677348 0.291276 0.381002 0.281710 1.003635 0.602449 1.731873 2.017984 0.467668 0.814329 0.399847 0.368089 0.173905 1.765544 3.571456 1.501244 0.259614 0.311747 0.785154 0.160895 0.423187 2.944136 0.343696 0.425005 0.324852 0.757031 0.543620 0.404630 0.552994 0.363880 0.313086 0.447197 0.256648 0.184388 0.856166 0.200737 1.524364 0.301881 0.386087 1.279600 0.185456 0.148696 0.190532 0.108303 0.145330 0.176213 0.132560 0.117760 0.244817 0.142445 0.510472 0.255343 1.663550 2.016831 0.108196 0.996848 0.846709 0.193685 0.118508 0.134762 0.108303 0.106219 0.435777 0.111883 0.133362 0.127431 1.926427 1 1 0 0
3 0.442107 0.617076 0.358626 2.466947 4.979503 0.222886 0.176463 2.152301 0.207004 1.595086 0.583277 0.296729 0.377087 0.313832 0.875390 0.520293 1.566852 2.132754 0.477671 0.727705 0.385639 0.362970 0.179449 1.286277 2.970137 1.419710 0.259536 0.279218 0.734492 0.162210 0.410615 2.500204 0.344509 0.429211 0.330121 0.746980 0.546763 0.386860 0.547849 0.366771 0.328492 0.442650 0.398534 0.161768 0.760234 0.184169 1.612382 0.296382 0.290680 1.198765 0.159799 0.166112 0.185323 0.103184 0.140656 0.163804 0.123210 0.117439 0.234947 0.145068 0.430996 0.251103 1.484624 1.957233 0.119883 0.990225 0.833277 0.192112 0.132781 0.134762 0.103184 0.111262 0.391691 0.130405 0.147444 0.146901 1.700563 1 1 0 0
4 0.434940 0.617430 0.358802 2.365785 4.718679 0.213106 0.173627 2.134014 0.192158 1.504230 0.550960 0.286961 0.363502 0.277964 0.864912 0.507990 1.480059 2.013697 0.483416 0.687794 0.367531 0.355311 0.174836 1.324695 2.896334 1.359876 0.250705 0.273667 0.702699 0.154827 0.398550 2.456560 0.329126 0.408755 0.313415 0.691956 0.536860 0.360816 0.512824 0.351551 0.312206 0.419095 0.393447 0.160200 0.768113 0.185718 1.645807 0.296829 0.309345 1.206995 0.164650 0.160687 0.188221 0.104784 0.141983 0.167710 0.136838 0.116048 0.255528 0.140871 0.481227 0.251773 1.534835 2.009109 0.119524 0.997775 0.878668 0.205604 0.129954 0.134762 0.104784 0.110694 0.434154 0.118481 0.140314 0.148380 1.839730 1 1 0 0
In [15]:
c_CS_m = df[y == 0]
c_SC_m = df[y == 1]
c_CS_s = df[y == 2]
c_SC_s = df[y == 3]
t_cs_m = df[y == 4]
t_SC_m = df[y == 5]
t_CS_s = df[y == 6]
t_SC_s = df[y == 7]

Detailed analysis¶

In [16]:
corr = df.corr(method='pearson').abs()

fig = plt.figure(figsize=(30,20))
sns.heatmap(corr, annot=True, cmap='tab10', vmin=-1, vmax=+1)
plt.title('Pearson Correlation')
plt.show()
No description has been provided for this image
In [17]:
df.corr()['class'].abs().sort_values()
Out[17]:
DSCR1_N            0.000405
pBRAF_N            0.001241
JNK_N              0.011161
ELK_N              0.011574
pPKCAB_N           0.014082
P70S6_N            0.014352
pCASP9_N           0.017277
AKT_N              0.021164
GluR4_N            0.022090
SNCA_N             0.022768
RSK_N              0.023023
PSD95_N            0.023064
CAMKII_N           0.023299
pJNK_N             0.026123
nNOS_N             0.026994
DYRK1A_N           0.027169
BCL2_N             0.030229
PKCA_N             0.038828
BDNF_N             0.040251
CREB_N             0.041921
pCFOS_N            0.043059
pCAMKII_N          0.047267
ERBB4_N            0.055579
ARC_N              0.055598
pS6_N              0.055598
ITSN1_N            0.059088
Bcatenin_N         0.061481
pGSK3B_Tyr216_N    0.061832
SHH_N              0.067863
CaNA_N             0.068317
BAD_N              0.068883
GSK3B_N            0.070202
NR1_N              0.071079
TRKA_N             0.071720
BRAF_N             0.072357
RAPTOR_N           0.079602
GFAP_N             0.080342
ERK_N              0.081793
CDK5_N             0.082622
Ubiquitin_N        0.082880
pELK_N             0.088713
EGR1_N             0.088915
pMEK_N             0.089968
ADARB1_N           0.090135
RRP1_N             0.096068
IL1B_N             0.096544
pAKT_N             0.098500
pNR2B_N            0.105142
BAX_N              0.107021
pGSK3B_N           0.108195
MEK_N              0.109355
P38_N              0.109655
pRSK_N             0.120650
pERK_N             0.133375
pNR1_N             0.139379
pNUMB_N            0.141955
TIAM1_N            0.145125
NUMB_N             0.146878
NR2B_N             0.147627
pMTOR_N            0.153278
MTOR_N             0.157890
H3MeK4_N           0.162646
pNR2A_N            0.162873
AMPKA_N            0.164279
NR2A_N             0.165538
P3525_N            0.178535
SYP_N              0.186339
pCREB_N            0.235717
H3AcK18_N          0.240085
SOD1_N             0.244898
pP70S6_N           0.251836
Behavior           0.255085
GluR3_N            0.282559
S6_N               0.305409
AcetylH3K9_N       0.306736
pPKCG_N            0.333035
Tau_N              0.349558
APP_N              0.430990
Treatment          0.436982
Genotype           0.871617
class              1.000000
Name: class, dtype: float64
In [18]:
for col in df.columns:
    plt.figure(figsize=(4,4))
    sns.distplot(c_CS_m[col],label='c_CS_m')
    sns.distplot(c_SC_m[col],label='c_SC_m')
    sns.distplot(c_CS_s[col],label='c_CS_s')
    sns.distplot(c_SC_s[col],label='c_SC_s')
    sns.distplot(t_cs_m[col],label='t_cs_m')
    sns.distplot(t_SC_m[col],label='t_SC_m')
    sns.distplot(t_CS_s[col],label='t_CS_s')
    sns.distplot(t_SC_s[col],label='t_SC_s')
    plt.legend()
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Modelling¶

In [19]:
from sklearn.model_selection import train_test_split
df = data.copy()
trainset, testset = train_test_split(df, test_size=0.2, random_state=0)
print(trainset['class'].value_counts())
print(testset['class'].value_counts())
c-CS-m    126
c-SC-m    125
t-SC-m    111
t-SC-s    109
c-CS-s    108
c-SC-s    104
t-CS-m    103
t-CS-s     78
Name: class, dtype: int64
t-CS-m    32
c-SC-s    31
t-CS-s    27
c-CS-s    27
t-SC-s    26
c-SC-m    25
c-CS-m    24
t-SC-m    24
Name: class, dtype: int64
In [20]:
_, X_train, y_train = preprocessing(trainset)
_, X_test, y_test = preprocessing(testset)
In [21]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler, RobustScaler
from sklearn.linear_model import LogisticRegression
from sklearn.decomposition import PCA
In [22]:
preprocessor = make_pipeline(StandardScaler())

PCAPipeline = make_pipeline(preprocessor, PCA(n_components=2,random_state=0))

RandomPipeline = make_pipeline(preprocessor,RandomForestClassifier(random_state=0))
AdaPipeline = make_pipeline(preprocessor,AdaBoostClassifier(random_state=0))
SVMPipeline = make_pipeline(preprocessor,SVC(random_state=0,probability=True))
KNNPipeline = make_pipeline(preprocessor,KNeighborsClassifier())
LRPipeline = make_pipeline(preprocessor,LogisticRegression(solver='sag'))

PCA Analysis¶

In [23]:
PCA_df = pd.DataFrame(PCAPipeline.fit_transform(X))
PCA_df = pd.concat([PCA_df, data['class']], axis=1)
PCA_df.head()
Out[23]:
0 1 class
0 4.643039 5.456956 c-CS-m
1 3.028412 5.515211 c-CS-m
2 3.144482 5.717241 c-CS-m
3 0.643212 4.072579 c-CS-m
4 -0.406231 4.367141 c-CS-m
In [24]:
plt.figure(figsize=(8,8))
sns.scatterplot(PCA_df[0],PCA_df[1],hue=PCA_df['class'],palette=sns.color_palette("Paired", 8))
plt.show()
No description has been provided for this image

Classification problem¶

In [25]:
dict_of_models = {'RandomForest': RandomPipeline,
'AdaBoost': AdaPipeline,
'SVM': SVMPipeline,
'KNN': KNNPipeline,
'LR': LRPipeline}
In [26]:
from sklearn.metrics import f1_score, accuracy_score, confusion_matrix, classification_report, roc_curve
from sklearn.model_selection import learning_curve, cross_val_score, GridSearchCV

def evaluation(model):
    model.fit(X_train, y_train)
    # calculating the probabilities
    y_pred_proba = model.predict_proba(X_test)

    # finding the predicted valued
    y_pred = np.argmax(y_pred_proba,axis=1)
    print('Accuracy = ', accuracy_score(y_test, y_pred))
    print('-')
    print(confusion_matrix(y_test,y_pred))
    print('-')
    print(classification_report(y_test,y_pred))
    print('-')
    
    N, train_score, val_score = learning_curve(model, X_train, y_train, cv=4, scoring='accuracy', train_sizes=np.linspace(0.1,1,10))
    
    plt.figure(figsize=(8,6))
    plt.plot(N, train_score.mean(axis=1), label='train score')
    plt.plot(N, val_score.mean(axis=1), label='validation score')
    plt.legend()
In [27]:
for name, model in dict_of_models.items():
    print('---------------------------------')
    print(name)
    evaluation(model)
---------------------------------
RandomForest
Accuracy =  1.0
-
[[24  0  0  0  0  0  0  0]
 [ 0 25  0  0  0  0  0  0]
 [ 0  0 27  0  0  0  0  0]
 [ 0  0  0 31  0  0  0  0]
 [ 0  0  0  0 32  0  0  0]
 [ 0  0  0  0  0 24  0  0]
 [ 0  0  0  0  0  0 27  0]
 [ 0  0  0  0  0  0  0 26]]
-
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        24
           1       1.00      1.00      1.00        25
           2       1.00      1.00      1.00        27
           3       1.00      1.00      1.00        31
           4       1.00      1.00      1.00        32
           5       1.00      1.00      1.00        24
           6       1.00      1.00      1.00        27
           7       1.00      1.00      1.00        26

    accuracy                           1.00       216
   macro avg       1.00      1.00      1.00       216
weighted avg       1.00      1.00      1.00       216

-
---------------------------------
AdaBoost
Accuracy =  1.0
-
[[24  0  0  0  0  0  0  0]
 [ 0 25  0  0  0  0  0  0]
 [ 0  0 27  0  0  0  0  0]
 [ 0  0  0 31  0  0  0  0]
 [ 0  0  0  0 32  0  0  0]
 [ 0  0  0  0  0 24  0  0]
 [ 0  0  0  0  0  0 27  0]
 [ 0  0  0  0  0  0  0 26]]
-
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        24
           1       1.00      1.00      1.00        25
           2       1.00      1.00      1.00        27
           3       1.00      1.00      1.00        31
           4       1.00      1.00      1.00        32
           5       1.00      1.00      1.00        24
           6       1.00      1.00      1.00        27
           7       1.00      1.00      1.00        26

    accuracy                           1.00       216
   macro avg       1.00      1.00      1.00       216
weighted avg       1.00      1.00      1.00       216

-
---------------------------------
SVM
Accuracy =  1.0
-
[[24  0  0  0  0  0  0  0]
 [ 0 25  0  0  0  0  0  0]
 [ 0  0 27  0  0  0  0  0]
 [ 0  0  0 31  0  0  0  0]
 [ 0  0  0  0 32  0  0  0]
 [ 0  0  0  0  0 24  0  0]
 [ 0  0  0  0  0  0 27  0]
 [ 0  0  0  0  0  0  0 26]]
-
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        24
           1       1.00      1.00      1.00        25
           2       1.00      1.00      1.00        27
           3       1.00      1.00      1.00        31
           4       1.00      1.00      1.00        32
           5       1.00      1.00      1.00        24
           6       1.00      1.00      1.00        27
           7       1.00      1.00      1.00        26

    accuracy                           1.00       216
   macro avg       1.00      1.00      1.00       216
weighted avg       1.00      1.00      1.00       216

-
---------------------------------
KNN
Accuracy =  0.9768518518518519
-
[[24  0  0  0  0  0  0  0]
 [ 0 25  0  0  0  0  0  0]
 [ 1  0 26  0  0  0  0  0]
 [ 0  0  0 31  0  0  0  0]
 [ 0  0  0  0 32  0  0  0]
 [ 0  0  0  2  0 22  0  0]
 [ 1  0  0  0  0  0 25  1]
 [ 0  0  0  0  0  0  0 26]]
-
              precision    recall  f1-score   support

           0       0.92      1.00      0.96        24
           1       1.00      1.00      1.00        25
           2       1.00      0.96      0.98        27
           3       0.94      1.00      0.97        31
           4       1.00      1.00      1.00        32
           5       1.00      0.92      0.96        24
           6       1.00      0.93      0.96        27
           7       0.96      1.00      0.98        26

    accuracy                           0.98       216
   macro avg       0.98      0.98      0.98       216
weighted avg       0.98      0.98      0.98       216

-
---------------------------------
LR
Accuracy =  1.0
-
[[24  0  0  0  0  0  0  0]
 [ 0 25  0  0  0  0  0  0]
 [ 0  0 27  0  0  0  0  0]
 [ 0  0  0 31  0  0  0  0]
 [ 0  0  0  0 32  0  0  0]
 [ 0  0  0  0  0 24  0  0]
 [ 0  0  0  0  0  0 27  0]
 [ 0  0  0  0  0  0  0 26]]
-
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        24
           1       1.00      1.00      1.00        25
           2       1.00      1.00      1.00        27
           3       1.00      1.00      1.00        31
           4       1.00      1.00      1.00        32
           5       1.00      1.00      1.00        24
           6       1.00      1.00      1.00        27
           7       1.00      1.00      1.00        26

    accuracy                           1.00       216
   macro avg       1.00      1.00      1.00       216
weighted avg       1.00      1.00      1.00       216

-
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Mid-conclusion : 100% Accuracy on most models¶

For the 5 models tested hereabove, here are the accuracies :

  • KNN : 98%
  • SVM / RandomForest / Adaboost / LogisticRegression : 100%

Idea 1 : Separate the data in 2 groups : Control mice "C-" and Trisomy mice "T-"¶

The idea here is to identify what was injected to the mouse and wether it is stimulated to learn or not¶

(H0 : The mouse is not trisomic)

In [28]:
Control_df = data.loc[data['class'].str.startswith('c', na=False)]
Trisomy_df = data.loc[data['class'].str.startswith('t', na=False)]
print(Trisomy_df['class'].unique())
['t-CS-m' 't-SC-m' 't-CS-s' 't-SC-s']

Control mice "C-"¶

In [29]:
Control_df,X,y = preprocessing(Control_df)
trainset, testset = train_test_split(Control_df, test_size=0.2, random_state=0)
print(trainset['class'].value_counts())
print(testset['class'].value_counts())
0    122
1    122
3    110
2    102
Name: class, dtype: int64
2    33
0    28
1    28
3    25
Name: class, dtype: int64
In [30]:
_, X_train, y_train = preprocessing(trainset)
_, X_test, y_test = preprocessing(testset)

PCA Analysis¶

In [31]:
PCA_df = pd.DataFrame(PCAPipeline.fit_transform(X))
PCA_df.reset_index(drop=True, inplace=True)
y.reset_index(drop=True, inplace=True)
PCA_df = pd.concat([PCA_df, y], axis=1)
PCA_df.head()
Out[31]:
0 1 class
0 6.208842 -4.046564 0
1 4.736616 -4.661880 0
2 4.915870 -4.728123 0
3 2.018200 -4.454860 0
4 1.128297 -5.048759 0
In [32]:
plt.figure(figsize=(8,8))
sns.scatterplot(PCA_df[0],PCA_df[1],hue=PCA_df['class'],palette=sns.color_palette("Paired", 4))
plt.show()
No description has been provided for this image

Models evalutation¶

In [33]:
for name, model in dict_of_models.items():
    print('---------------------------------')
    print(name)
    evaluation(model)
---------------------------------
RandomForest
Accuracy =  1.0
-
[[28  0  0  0]
 [ 0 28  0  0]
 [ 0  0 33  0]
 [ 0  0  0 25]]
-
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        28
           1       1.00      1.00      1.00        28
           2       1.00      1.00      1.00        33
           3       1.00      1.00      1.00        25

    accuracy                           1.00       114
   macro avg       1.00      1.00      1.00       114
weighted avg       1.00      1.00      1.00       114

-
---------------------------------
AdaBoost
Accuracy =  1.0
-
[[28  0  0  0]
 [ 0 28  0  0]
 [ 0  0 33  0]
 [ 0  0  0 25]]
-
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        28
           1       1.00      1.00      1.00        28
           2       1.00      1.00      1.00        33
           3       1.00      1.00      1.00        25

    accuracy                           1.00       114
   macro avg       1.00      1.00      1.00       114
weighted avg       1.00      1.00      1.00       114

-
---------------------------------
SVM
Accuracy =  0.9912280701754386
-
[[28  0  0  0]
 [ 0 28  0  0]
 [ 0  0 33  0]
 [ 0  0  1 24]]
-
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        28
           1       1.00      1.00      1.00        28
           2       0.97      1.00      0.99        33
           3       1.00      0.96      0.98        25

    accuracy                           0.99       114
   macro avg       0.99      0.99      0.99       114
weighted avg       0.99      0.99      0.99       114

-
---------------------------------
KNN
Accuracy =  0.9736842105263158
-
[[28  0  0  0]
 [ 0 28  0  0]
 [ 2  0 31  0]
 [ 0  1  0 24]]
-
              precision    recall  f1-score   support

           0       0.93      1.00      0.97        28
           1       0.97      1.00      0.98        28
           2       1.00      0.94      0.97        33
           3       1.00      0.96      0.98        25

    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.98      0.97      0.97       114

-
---------------------------------
LR
Accuracy =  1.0
-
[[28  0  0  0]
 [ 0 28  0  0]
 [ 0  0 33  0]
 [ 0  0  0 25]]
-
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        28
           1       1.00      1.00      1.00        28
           2       1.00      1.00      1.00        33
           3       1.00      1.00      1.00        25

    accuracy                           1.00       114
   macro avg       1.00      1.00      1.00       114
weighted avg       1.00      1.00      1.00       114

-
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Trisomy mice "T-" (Work in progress)¶

In [34]:
Trisomy_df,X,y = preprocessing(Trisomy_df)
trainset, testset = train_test_split(Trisomy_df, test_size=0.2, random_state=0)
print(trainset['class'].value_counts())
print(testset['class'].value_counts())
5    110
4    109
7    108
6     81
Name: class, dtype: int64
7    27
4    26
5    25
6    24
Name: class, dtype: int64
In [35]:
_, X_train, y_train = preprocessing(trainset)
_, X_test, y_test = preprocessing(testset)
In [36]:
y_train.head()
Out[36]:
892    6
942    6
641    4
965    7
576    4
Name: class, dtype: int64

PCA Analysis¶

In [37]:
PCA_df = pd.DataFrame(PCAPipeline.fit_transform(X))
PCA_df.reset_index(drop=True, inplace=True)
y.reset_index(drop=True, inplace=True)
PCA_df = pd.concat([PCA_df, y], axis=1)
PCA_df.head()
Out[37]:
0 1 class
0 -1.019302 7.414716 4
1 0.148803 8.098498 4
2 0.629234 8.159883 4
3 -3.759011 5.464395 4
4 -2.490917 6.301093 4
In [38]:
plt.figure(figsize=(8,8))
sns.scatterplot(PCA_df[0],PCA_df[1],hue=PCA_df['class'],palette=sns.color_palette("Paired", 4))
plt.show()
No description has been provided for this image

Models evalutation¶

In [39]:
for name, model in dict_of_models.items():
    print('---------------------------------')
    print(name)
    evaluation(model)
---------------------------------
RandomForest
Accuracy =  0.0
-
[[ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [26  0  0  0  0  0  0  0]
 [ 0 25  0  0  0  0  0  0]
 [ 0  0 24  0  0  0  0  0]
 [ 0  0  0 27  0  0  0  0]]
-
              precision    recall  f1-score   support

           0       0.00      0.00      0.00       0.0
           1       0.00      0.00      0.00       0.0
           2       0.00      0.00      0.00       0.0
           3       0.00      0.00      0.00       0.0
           4       0.00      0.00      0.00      26.0
           5       0.00      0.00      0.00      25.0
           6       0.00      0.00      0.00      24.0
           7       0.00      0.00      0.00      27.0

    accuracy                           0.00     102.0
   macro avg       0.00      0.00      0.00     102.0
weighted avg       0.00      0.00      0.00     102.0

-
---------------------------------
AdaBoost
Accuracy =  0.0
-
[[ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [26  0  0  0  0  0  0  0]
 [ 0 25  0  0  0  0  0  0]
 [ 0  0 24  0  0  0  0  0]
 [ 0  0  0 27  0  0  0  0]]
-
              precision    recall  f1-score   support

           0       0.00      0.00      0.00       0.0
           1       0.00      0.00      0.00       0.0
           2       0.00      0.00      0.00       0.0
           3       0.00      0.00      0.00       0.0
           4       0.00      0.00      0.00      26.0
           5       0.00      0.00      0.00      25.0
           6       0.00      0.00      0.00      24.0
           7       0.00      0.00      0.00      27.0

    accuracy                           0.00     102.0
   macro avg       0.00      0.00      0.00     102.0
weighted avg       0.00      0.00      0.00     102.0

-
---------------------------------
SVM
Accuracy =  0.0
-
[[ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [26  0  0  0  0  0  0  0]
 [ 0 25  0  0  0  0  0  0]
 [ 0  0 24  0  0  0  0  0]
 [ 0  0  0 27  0  0  0  0]]
-
              precision    recall  f1-score   support

           0       0.00      0.00      0.00       0.0
           1       0.00      0.00      0.00       0.0
           2       0.00      0.00      0.00       0.0
           3       0.00      0.00      0.00       0.0
           4       0.00      0.00      0.00      26.0
           5       0.00      0.00      0.00      25.0
           6       0.00      0.00      0.00      24.0
           7       0.00      0.00      0.00      27.0

    accuracy                           0.00     102.0
   macro avg       0.00      0.00      0.00     102.0
weighted avg       0.00      0.00      0.00     102.0

-
---------------------------------
KNN
Accuracy =  0.0
-
[[ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [26  0  0  0  0  0  0  0]
 [ 0 25  0  0  0  0  0  0]
 [ 0  0 24  0  0  0  0  0]
 [ 0  0  0 27  0  0  0  0]]
-
              precision    recall  f1-score   support

           0       0.00      0.00      0.00       0.0
           1       0.00      0.00      0.00       0.0
           2       0.00      0.00      0.00       0.0
           3       0.00      0.00      0.00       0.0
           4       0.00      0.00      0.00      26.0
           5       0.00      0.00      0.00      25.0
           6       0.00      0.00      0.00      24.0
           7       0.00      0.00      0.00      27.0

    accuracy                           0.00     102.0
   macro avg       0.00      0.00      0.00     102.0
weighted avg       0.00      0.00      0.00     102.0

-
---------------------------------
LR
Accuracy =  0.0
-
[[ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [ 0  0  0  0  0  0  0  0]
 [26  0  0  0  0  0  0  0]
 [ 0 25  0  0  0  0  0  0]
 [ 0  0 24  0  0  0  0  0]
 [ 0  0  0 27  0  0  0  0]]
-
              precision    recall  f1-score   support

           0       0.00      0.00      0.00       0.0
           1       0.00      0.00      0.00       0.0
           2       0.00      0.00      0.00       0.0
           3       0.00      0.00      0.00       0.0
           4       0.00      0.00      0.00      26.0
           5       0.00      0.00      0.00      25.0
           6       0.00      0.00      0.00      24.0
           7       0.00      0.00      0.00      27.0

    accuracy                           0.00     102.0
   macro avg       0.00      0.00      0.00     102.0
weighted avg       0.00      0.00      0.00     102.0

-
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Idea 2 : Separate the data in 4 groups : "CS-M" "CS-S" "SC-M" "SC-S"¶

The idea here is to identify wheter the mouse is trisomic or not¶

(H0 : The mouse has been injected with saline)

(H1 : The mouse is stimulated to learn)

Not Implemented

In [ ]: